1,326 research outputs found

    Model Assessment Tools for a Model False World

    Full text link
    A standard goal of model evaluation and selection is to find a model that approximates the truth well while at the same time is as parsimonious as possible. In this paper we emphasize the point of view that the models under consideration are almost always false, if viewed realistically, and so we should analyze model adequacy from that point of view. We investigate this issue in large samples by looking at a model credibility index, which is designed to serve as a one-number summary measure of model adequacy. We define the index to be the maximum sample size at which samples from the model and those from the true data generating mechanism are nearly indistinguishable. We use standard notions from hypothesis testing to make this definition precise. We use data subsampling to estimate the index. We show that the definition leads us to some new ways of viewing models as flawed but useful. The concept is an extension of the work of Davies [Statist. Neerlandica 49 (1995) 185--245].Comment: Published in at http://dx.doi.org/10.1214/09-STS302 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Building and using semiparametric tolerance regions for parametric multinomial models

    Full text link
    We introduce a semiparametric ``tubular neighborhood'' of a parametric model in the multinomial setting. It consists of all multinomial distributions lying in a distance-based neighborhood of the parametric model of interest. Fitting such a tubular model allows one to use a parametric model while treating it as an approximation to the true distribution. In this paper, the Kullback--Leibler distance is used to build the tubular region. Based on this idea one can define the distance between the true multinomial distribution and the parametric model to be the index of fit. The paper develops a likelihood ratio test procedure for testing the magnitude of the index. A semiparametric bootstrap method is implemented to better approximate the distribution of the LRT statistic. The approximation permits more accurate construction of a lower confidence limit for the model fitting index.Comment: Published in at http://dx.doi.org/10.1214/08-AOS603 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    The topography of multivariate normal mixtures

    Get PDF
    Multivariate normal mixtures provide a flexible method of fitting high-dimensional data. It is shown that their topography, in the sense of their key features as a density, can be analyzed rigorously in lower dimensions by use of a ridgeline manifold that contains all critical points, as well as the ridges of the density. A plot of the elevations on the ridgeline shows the key features of the mixed density. In addition, by use of the ridgeline, we uncover a function that determines the number of modes of the mixed density when there are two components being mixed. A followup analysis then gives a curvature function that can be used to prove a set of modality theorems.Comment: Published at http://dx.doi.org/10.1214/009053605000000417 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Estimating the number of classes

    Full text link
    Estimating the unknown number of classes in a population has numerous important applications. In a Poisson mixture model, the problem is reduced to estimating the odds that a class is undetected in a sample. The discontinuity of the odds prevents the existence of locally unbiased and informative estimators and restricts confidence intervals to be one-sided. Confidence intervals for the number of classes are also necessarily one-sided. A sequence of lower bounds to the odds is developed and used to define pseudo maximum likelihood estimators for the number of classes.Comment: Published at http://dx.doi.org/10.1214/009053606000001280 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Comparison of satellite based cloud retrieval methods for cirrus and stratocumulus

    Get PDF
    One difficulty in using satellite remote sensing data is the spatial variability of cloud properties on scales smaller than most meteorological satellite fields of view (approx. 4 to 8 km). The variation is examined of satellite derived cloud cover as a function of the satellite sensor spatial resolution for seven cloud cover retrieval methods: (1) Reflectance threshold; (2) Temperature threshold; (3) ISCCP; (4) HBTM (Hybrid Bispectral Threshold Method); (5) NCLE; (6) Spatial coherence; and (7) Functional Box Counting. The first two methods are simple single spectral thresholds which specify a satellite pixel as cloud filled if the measured reflectance is greater than the threshold, or if the measured equivalent blackbody temperature is less than the threshold. The next three methods are bispectral, using one visible wavelength window channel and one thermal infrared wavelength window. The final two algorithms rely on the spatial variability within the cloud field to determine cloud cover. Spatial coherence assumes only that the cloud field occurs in a single layer and that the clouds are optically thick in the infrared window. LANDSAT Thematic Mapper (TM) data is used to test the spatial resolution dependence of the cloud algorithms. The ISCCP bispectral threshold applied to the full resolution data is used as the reference or truth cloud cover, after which the retrieval methods are applied to the spatial resolutions. Studies of the fraction of pixels in the scene at cloud edge, and of the profile of reflectance and temperature near cloud edges indicate an uncertainty in the reference cloud fraction of 1 to 5 percent

    UNCERTAINTY IN WATER RESOURCE PLANNING: AN ECONOMIC EVALUATION OF A WATER USE REDUCTION ALTERNATIVE

    Get PDF

    Disconnected Loop Noise Methods in Lattice QCD

    Get PDF
    A comparison of the noise variance between algorithms for calculating disconnected loop signals in lattice QCD is carried out. The methods considered are the Z(N) noise method and the Volume method. We find that the noise variance is strongly influenced by the Dirac structure of the operator.Comment: espcrc.sty file needed. Talk presented at Lattice '97, Edinburgh, Scotlan
    corecore